Empirical Comparison of Nonparametric Regression Estimates on Real Data

نویسندگان

  • Daniel Jones
  • Michael Kohler
  • Adam Krzyzak
  • Alexander Richter
چکیده

The performance of nine different nonparametric regression estimates is empirically compared on ten different real data sets. The number of data points in the real data sets varies between 7900 and 18000, where each real data set contains between 5 and 20 variables. The nonparametric regression estimates include kernel, partitioning, nearest neighbor, additive spline, neural network, penalized smoothing splines, local linear kernel, regression trees and random forests estimates. The main result is a table containing the empirical L2 risks of all nine nonparametric regression estimates on the evaluation part of the different data sets. The neural networks and random forests are the two estimates performing best. The data sets are publicly available, so that any new regression estimate can be easily compared with all nine estimates considered in this paper by just applying it to the publicly available data and by computing its empirical L2 risks on the evaluation part of the data sets. AMS classification: Primary 62G08, secondary 62P99.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Regression Estimation under Kernel Polynomial Model for Unstructured Data

The nonparametric estimation(NE) of kernel polynomial regression (KPR) model is a powerful tool to visually depict the effect of covariates on response variable, when there exist unstructured and heterogeneous data. In this paper we introduce KPR model that is the mixture of nonparametric regression models with bootstrap algorithm, which is considered in a heterogeneous and unstructured framewo...

متن کامل

Empirical estimates for various correlations in longitudinal-dynamic heteroscedastic hierarchical normal models

In this paper, we first define longitudinal-dynamic heteroscedastic hierarchical  normal  models. These models can be used to fit longitudinal data in which the dependency structure is constructed through a dynamic model rather than observations. We discuss different methods for estimating the hyper-parameters. Then the corresponding estimates for the hyper-parameter that causes the association...

متن کامل

A Comparison of Thin Plate and Spherical Splines with Multiple Regression

Thin plate and spherical splines are nonparametric methods suitable for spatial data analysis. Thin plate splines acquire efficient practical and high precision solutions in spatial interpolations. Two components in the model fitting is considered: spatial deviations of data and the model roughness. On the other hand, in parametric regression, the relationship between explanatory and response v...

متن کامل

Asymptotics of nonparametric L-1 regression models with dependent data.

We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile est...

متن کامل

THE COMPARISON OF TWO METHOD NONPARAMETRIC APPROACH ON SMALL AREA ESTIMATION (CASE: APPROACH WITH KERNEL METHODS AND LOCAL POLYNOMIAL REGRESSION)

Small Area estimation is a technique used to estimate parameters of subpopulations with small sample sizes.  Small area estimation is needed  in obtaining information on a small area, such as sub-district or village.  Generally, in some cases, small area estimation uses parametric modeling.  But in fact, a lot of models have no linear relationship between the small area average and the covariat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Communications in Statistics - Simulation and Computation

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2016